Feature Selection by Nonparametric Bayes Error Minimization

نویسندگان

  • Shuang-Hong Yang
  • Bao-Gang Hu
چکیده

This paper presents an algorithmic framework for feature selection, which selects a subset of features by minimizing the nonparametric Bayes error. A set of existing algorithms as well as new ones can be derived naturally from this framework. For example, we show that the Relief algorithm greedily attempts to minimize the Bayes error estimated by k-Nearest-Neighbor method. This new interpretation not only reveals the secret behind Relief but also offers various opportunities to improve it or to establish new alternatives. In particular, we develop a new feature weighting algorithm, named Parzen-Relief, which minimizes the Bayes error estimated by Parzen method. Additionally, to enhance its ability to handle imbalanced and multiclass data, we integrate the class distribution with the max-margin objective function, leading to a new algorithm, named MAP-Relief. Comparison on benchmark data sets confirms the effectiveness of the proposed algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Clustering Based Feature Selection and Nonparametric Bayes Error Minimization and Support Vector Machines (SVMs)

In recent years feature selection is an eminent task in knowledge discovery database (KDD) that selects appropriate features from massive amount of high-dimensional data. In an attempt to establish theoretical justification for feature selection algorithms, this work presents a theoretical optimal criterion, specifically, the discriminative optimal criterion (DoC) for feature selection. Computa...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

GENOMIC SELECTION Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods includin...

متن کامل

Search Strategies for Binary Feature Selection for a Naive Bayes Classifier

We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability outperform filter approaches while retaining a reasonable computational cost.

متن کامل

Feature Selection by Maximum Marginal Diversity

We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computationa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008